Comparison of channel normalisation techniques for automatic speech recognition over the phone
نویسندگان
چکیده
We compared three different channel normalisation (CN) methods in the context of a connected digit recognition task over the phone: ceptrum mean substraction (CMS), RASTA filtering and the Gaussian dynamic cepstrum reprsentation (GDCR). Using a small set of context-independent (CI) continuous Gaussian mixture hidden Markov models (HMMs) we found that CMS and RASTA outperformed the GDCR technique. We show that the main cause for the superiority of CMS compared to RASTA is the phase distortion introduced by the RASTA filter. Recognition results for a phasecorrected RASTA technique are identical to those of CMS. Our results indicate that an ideal cepstrum based CN method should (1) effectively remove the DC-component, (2) at least preserve modulation frequencies in the range 2-16 Hz and (3) introduce no phase distortion in case CI HMMs are used for recognition.
منابع مشابه
Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملCross-bandwidth adaptation for ASR systems
Mismatches between application and training data greatly reduce the performance of automatic speech recognition (ASR) systems. However, collecting suitable amounts of in-domain and application-specific data for training is resource intensive and may not be feasible for resource-scarce environments. Utilising limited amounts of in-domain data and a combination of feature normalisation and acoust...
متن کاملChannel Norm Alisation Using Phase-corrected Rasta
Recently, we proposed an extension to the classical RASTA technique. The new method consists of classical RASTA ltering followed by a phase correction operation. In this manner, the innuence of the communication channel is as eeectively removed as with classical RASTA. However, our proposal does not introduce a left-context dependency like classical RASTA. Therefore the new method is better sui...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملSpeech Recognition over Mobile Networks
This chapter addresses issues associated with automatic speech recognition (ASR) over mobile networks, and introduces several techniques for improving speech recognition performance. One of these issues is the performance degradation of ASR over mobile networks that results from distortions produced by speech coding algorithms employed in mobile communication systems, transmission errors occurr...
متن کامل